Skip to content

notebooks: Quickstart for model documentation edit#372

Merged
validbeck merged 28 commits intomainfrom
beck/sc-7911/edit-code-samples-notebooks-quickstart-for
May 21, 2025
Merged

notebooks: Quickstart for model documentation edit#372
validbeck merged 28 commits intomainfrom
beck/sc-7911/edit-code-samples-notebooks-quickstart-for

Conversation

@validbeck
Copy link
Collaborator

@validbeck validbeck commented May 14, 2025

Pull Request Description

sc-7911

What

  • As part of our initiative to clean up and fortify our Jupyter Notebooks, the model documentation quickstart has been tweaked to be cleaner and have more context for beginner users.
  • There is now a new "quickstart" directory in notebooks/ and an updated README to accommodate:

Screenshot 2025-05-14 at 11 29 11 AM

Why

Our notebooks really need some TLC — this is the first stepping stone. Cleaning up this notebook also allows for us to build a complementary "Quickstart for model validation" next.

How to Test

  1. Pull down this PR: gh pr checkout 372
  2. Open notebooks/quickstart/quickstart_model_documentation.ipynb to review and run.

Pull Request Dependencies

Changes to the notebooks were also pulled into:

External Release Notes

Want to get started with documenting models with the ValidMind Library? Check out our updated Quickstart for model documentation notebook:

  • Learn the basics of using ValidMind to document models as part of a model development workflow.
  • Set up the ValidMind Library in your environment, and generate a draft of documentation using ValidMind tests for a binary classification model.

Deployment Notes

Refer to the above section "Pull Request Dependencies."

Breaking Changes

Note

This gets rid of the old notebooks/code_samples/quickstart_customer_churn_full_suite.ipynb file as the new file and directory replaces it.

Links have been fixed in both validmind-library and documentation in the two PRs above.

Screenshots/Videos (Frontend Only)

n/a

Checklist

  • PR body describes what, why, and how to test
  • Release notes written
  • Deployment notes written — N/A
  • Breaking changes identified
  • Labels applied
  • PR linked to Shortcut
  • Screenshots/videos added (Frontend) — N/A
  • Unit tests added (Backend) — N/A
  • Tested locally
  • Documentation updated (if required)

Areas Needing Special Review

I expanded/broke down the following sections as the original was really compressed and hard to understand why we were performing those tasks, but since I am not a model developer or model expert, someone should double-check that the explanations provided are accurate and relevant for the following:

  • Preprocessing the raw dataset
    Screenshot 2025-05-14 at 11 32 04 AM
  • Training an XGBoost classifier model
    Screenshot 2025-05-14 at 11 32 14 AM

Additional Notes

n/a

@validbeck validbeck self-assigned this May 14, 2025
@validbeck validbeck added the enhancement New feature or request label May 14, 2025
Copy link
Contributor

@LoiAnsah LoiAnsah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for "Preprocessing the Raw Dataset":

Note: I tried to quote specific sections and suggest an alternative using "->".

- For split the dataset:

"Next...ValidMind" -> Before running test with Validmind, we will need to preprocess the dataset. This involves splitting the data and separating the features (inputs) from the targets (outputs).

"Use preprocess()... parts" -> Use preprocess() to split our dataset into three subsets
"train_df...model." -> Used to train the model. (train because it is the standard term in ML)
"Validation_df...trained" -> Used to evaluate the model's performance during training.
"test_df...data" -> Used later on to asses the model's performance on new, unseen data .

For Separate feature and targets:

My suggestion:

To train the model, we need to provide it with:

  1. Inputs - ....
  2. Outputs (Expected answers/labels) - in our case, we would like to know whether the customer churned or not

Note: I believe there is a "to" missing before hold

Review for "Training an XGBoost classifier model":

error- Measures how....
logloss - Indicates how...
auc - Evaluate how...

Note: I simply added action verbs.

@validbeck
Copy link
Collaborator Author

validbeck commented May 14, 2025

@LoiAnsah These are excellent suggestions. May I suggest you make them official? ;)

GitHub: About reviewing pull requests (EDIT: Oops, forgot the link!)

Optionally, to suggest a specific change to the line or lines, click [ (see image), then edit the text within the suggestion block.

image

You may run into something interesting when you look at the .ipynb file online — the Jupyter Notebooks primer I wrote that's available under the intern guides may explain some of the oddness. Give it a try anyhow!

@validbeck validbeck requested a review from LoiAnsah May 14, 2025 22:20
@LoiAnsah
Copy link
Contributor

@validbeck Will make sure to add them!

@CLAassistant
Copy link

CLAassistant commented May 15, 2025

CLA assistant check
All committers have signed the CLA.

@validbeck
Copy link
Collaborator Author

@LoiAnsah Pushing up a commit is one way you can suggest changes, good job figuring it out! But I actually wanted you to try this feature, as I wanted to make sure you understood how to use it (and this way, the person owning the PR gets to decide whether or not to apply the changes):

Optionally, to suggest a specific change to the line or lines, click (see image), then edit the text within the suggestion block.

image

I'm going to revert the PR to the previous commit, so you can try the "suggestion" feature again. :)

@validbeck validbeck force-pushed the beck/sc-7911/edit-code-samples-notebooks-quickstart-for branch from 7b59545 to 191b90a Compare May 15, 2025 16:46
Copy link
Contributor

@LoiAnsah LoiAnsah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added my suggestions :)

validbeck and others added 2 commits May 16, 2025 09:20
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
validbeck and others added 4 commits May 16, 2025 09:21
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
@github-actions
Copy link
Contributor

PR Summary

This pull request refactors the organization of Jupyter notebooks within the project, specifically focusing on the quickstart guide for model documentation using ValidMind. The changes include:

  1. Reorganization of Notebooks: The quickstart_customer_churn_full_suite.ipynb notebook has been removed and its content has been relocated to a new notebook named quickstart_model_documentation.ipynb under the notebooks/quickstart directory. This change aims to improve the logical organization of the notebooks by grouping quickstart guides together.

  2. Updates to Documentation References: References within the notebooks have been updated to reflect the new location of the quickstart guide. This includes updates in markdown cells and code comments to ensure that users are directed to the correct resources.

  3. Script Adjustments: The run_e2e_notebooks.py script has been updated to reflect the new path of the quickstart notebook, ensuring that the end-to-end tests continue to function correctly with the relocated notebook.

  4. Minor Documentation Enhancements: Some markdown cells have been enhanced with additional explanations and links to external resources, such as the Pandas DataFrame documentation, to provide users with more context and learning resources.

These changes are intended to enhance the usability and maintainability of the project by improving the organization and clarity of the documentation resources.

Test Suggestions

  • Run the run_e2e_notebooks.py script to ensure all notebooks execute without errors.
  • Verify that all links and references within the notebooks point to the correct locations after the reorganization.
  • Check that the new quickstart_model_documentation.ipynb notebook functions as expected and produces the correct outputs.
  • Ensure that the documentation enhancements provide clear and accurate information to users.

validbeck and others added 5 commits May 16, 2025 09:21
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
@github-actions
Copy link
Contributor

PR Summary

This pull request refactors the structure of the Jupyter notebooks used in the ValidMind project. The primary change involves relocating the quickstart_customer_churn_full_suite.ipynb notebook to a new location and renaming it to quickstart_model_documentation.ipynb. This change is reflected in the scripts/run_e2e_notebooks.py file, which now points to the new location of the notebook. Additionally, the README and other documentation files have been updated to reflect this change.

The PR also includes minor updates to the documentation within the notebooks, such as clarifying the description of a Pandas DataFrame and ensuring consistent terminology (e.g., changing 'test' to 'testing' datasets). These changes aim to improve the clarity and usability of the documentation for users.

Overall, this PR enhances the organization and readability of the project documentation, making it easier for users to follow the quickstart guide for model documentation using ValidMind.

Test Suggestions

  • Run the relocated quickstart_model_documentation.ipynb notebook to ensure it executes without errors.
  • Verify that the scripts/run_e2e_notebooks.py script correctly identifies and runs the relocated notebook.
  • Check all links and references in the documentation to ensure they point to the correct notebook locations.
  • Review the updated documentation for clarity and accuracy.

@github-actions
Copy link
Contributor

PR Summary

This pull request refactors the structure of the Jupyter notebooks used in the project, specifically focusing on the quickstart guide for model documentation using ValidMind. The main changes include:

  1. Relocation and Renaming: The quickstart_customer_churn_full_suite.ipynb notebook has been removed and replaced with a new notebook quickstart_model_documentation.ipynb located in the notebooks/quickstart directory. This change aims to better organize the notebooks and make the quickstart guide more accessible.

  2. Content Updates: The new quickstart_model_documentation.ipynb notebook includes updated content and structure to guide users through the process of documenting models using ValidMind. It covers importing datasets, initializing the ValidMind library, setting up the environment, and running a full suite of tests.

  3. Documentation and Links: The notebook now includes more detailed explanations and links to relevant documentation, making it easier for users to understand the steps involved in model documentation.

  4. Script Update: The run_e2e_notebooks.py script has been updated to reflect the new path of the quickstart notebook, ensuring that the end-to-end tests are executed on the correct files.

  5. Minor Textual Changes: Some minor textual changes have been made across various notebooks to improve clarity and consistency, such as updating references to Pandas DataFrame and correcting terminology.

These changes aim to improve the usability and organization of the project’s documentation resources, making it easier for users to get started with ValidMind.

Test Suggestions

  • Run the updated quickstart_model_documentation.ipynb notebook to ensure all steps execute without errors.
  • Verify that the links to external resources and documentation within the notebook are correct and accessible.
  • Check that the run_e2e_notebooks.py script correctly executes the relocated quickstart notebook.
  • Ensure that the new notebook structure and content are clear and provide a comprehensive guide for new users.
  • Test the notebook in different Python environments to ensure compatibility, especially with the recommended Python versions.

@validbeck validbeck requested a review from LoiAnsah May 16, 2025 16:30
@validbeck
Copy link
Collaborator Author

validbeck commented May 16, 2025

@LoiAnsah Thank you for the detailed suggestions! Next, you want to double check that the new changes look good, then press the big ol' "Approve" button:

Approving a pull request with required reviews

Copy link
Contributor

@LoiAnsah LoiAnsah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

@validbeck
Copy link
Collaborator Author

@LoiAnsah Thank you for helping with reviewing this PR — you did awesome!

@github-actions
Copy link
Contributor

PR Summary

This pull request refactors the structure of the Jupyter notebooks used for demonstrating the ValidMind library. The main changes include:

  1. Relocation and Renaming: The quickstart_customer_churn_full_suite.ipynb notebook has been removed and replaced with a new notebook quickstart_model_documentation.ipynb located in the notebooks/quickstart directory. This change aims to better organize the quickstart guides and improve clarity.

  2. Documentation Updates: The README and other markdown cells within the notebooks have been updated to reflect the new structure and provide clearer instructions. This includes updating links and descriptions to ensure consistency with the new notebook structure.

  3. Code and Text Enhancements: Minor text edits have been made across various notebooks to improve clarity and consistency, such as adding more detailed descriptions of Pandas DataFrames and ensuring consistent terminology (e.g., 'testing datasets' instead of 'test datasets').

  4. Script Update: The run_e2e_notebooks.py script has been updated to reflect the new path of the quickstart notebook, ensuring that the end-to-end tests run the correct files.

Test Suggestions

  • Run the quickstart_model_documentation.ipynb notebook to ensure it executes without errors and produces the expected outputs.
  • Verify that all links in the updated README and notebooks point to the correct resources.
  • Check the run_e2e_notebooks.py script to ensure it correctly executes the relocated notebook.
  • Review the markdown content for clarity and accuracy in the context of the new notebook structure.

@validbeck validbeck merged commit bb08dc2 into main May 21, 2025
6 checks passed
@validbeck validbeck deleted the beck/sc-7911/edit-code-samples-notebooks-quickstart-for branch May 21, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants